(Vector and) Raster Data in R

Advanced Geospatial Data Processing for Social Scientists

Dennis Abel & Stefan Jünger

2025-04-28

Day Time Title
April 28 10:00-11:15 Introduction
April 28 11:15-11:30 Coffee Break
April 28 11:30-13:00 Raster data in R
April 28 13:00-14:00 Lunch Break
April 28 14:00-15:15 Raster data processing
April 28 15:15-15:30 Coffee Break
April 28 15:30-17:00 Graphical display of raster data in maps
April 29 10:00-11:15 Datacube processing I
April 29 11:15-11:30 Coffee Break
April 29 11:30-13:00 Datacube processing II & API access
April 29 13:00-14:00 Lunch Break
April 29 14:00-15:15 Data integration and linking (with survey data)
April 29 15:15-15:30 Coffee Break
April 29 15:30-17:00 Outlook and open session with own application

Refresher: geospatial data and their implementation in R

Why care about data types and formats?

There are differences in how spatial information is stored, processed, and visually represented.

  • Different commands for data import and manipulation
  • Spatial linking techniques and analyses partly determined by data format
  • Visualization of data can vary

So, always know what kind of data you are dealing with!

Vector and raster data

Sources: OpenStreetMap / GEOFABRIK (2018), City of Cologne (2014), and the Statistical Offices of the Federation and the Länder (2016) / Jünger, 2019

Vector data

Representing the world in vectors

The surface of the earth is represented by simple geometries and attributes.

Each object is defined by longitude (x) and latitude (y) values.

It could also include z coordinates…

Vector data: geometries

Every real-world feature is one of three types of geometry:

  • Points: discrete location (e.g., a tree)
  • Lines: linear feature (e.g., a river)
  • Polygons: enclosed areas (e.g., a city, country, administrative boundaries)

National Ecological Observatory Network (NEON), cited by Datacarpentry

Vector data: attribute tables

Only geometries means that we do not have any other information.

We must assign attributes to each geometry to hold additional information \(\rightarrow\) data tables called attribute tables.

  • Each row represents a geometric object, which we can also call observation, feature, or case
  • Each column holds an attribute or, in ‘our’ language, a variable

Vector data: attribute tables

File formats/extensions

  • GeoPackage .gpkg
  • Shapefile .shp
  • GeoJSON .geojson
  • Sometimes, vector data come even in a text format, such as CSV

Welcome to simple features

Several packages are out there to wrangle and visualize spatial and, especially, vector data within R. We will use the sf package (“simple features”).


Why?

simple features refers to a formal standard representing spatial geometries and supports interfaces to other programming languages and GIS systems (ISO 19125-1).

Illustration by Allison Horst

Load a vector data file

The first step is, of course, loading the data. We want to import the .geojson file for the administrative borders of the whole world called World_Countries.geojson.

# load library
library(sf)

# load data
# source: https://hub.arcgis.com/datasets/esri::world-countries-generalized/
hello_world <- sf::read_sf("./data/World_Countries.geojson")

What is this thing?

hello_world
Simple feature collection with 251 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -89 xmax: 180 ymax: 83.6236
Geodetic CRS:  WGS 84
# A tibble: 251 × 6
     FID COUNTRY             ISO   COUNTRYAFF  AFF_ISO                  geometry
   <int> <chr>               <chr> <chr>       <chr>          <MULTIPOLYGON [°]>
 1     1 Afghanistan         AF    "Afghanist… "AF"    (((61.27655 35.60725, 61…
 2     2 Albania             AL    "Albania"   "AL"    (((19.57083 41.68527, 19…
 3     3 Algeria             DZ    "Algeria"   "DZ"    (((4.603354 36.88791, 4.…
 4     4 American Samoa      AS    "United St… "US"    (((-170.7439 -14.37555, …
 5     5 Andorra             AD    "Andorra"   "AD"    (((1.445836 42.60194, 1.…
 6     6 Angola              AO    "Angola"    "AO"    (((23.47611 -17.62584, 2…
 7     7 Anguilla            AI    "United Ki… "GB"    (((-63.16778 18.16445, -…
 8     8 Antarctica          AQ    " "         " "     (((-180 -84.30535, -179.…
 9     9 Antigua and Barbuda AG    "Antigua a… "AG"    (((-61.73806 16.98972, -…
10    10 Argentina           AR    "Argentina" "AR"    (((-71.85916 -41.01128, …
# ℹ 241 more rows

We can already plot it

plot(sf::st_geometry(hello_world))

This is the bounding box

Inspect your data: classics

There are no huge differences between the file we just imported and a regular data table.

# head of data table
head(hello_world, 2)
Simple feature collection with 2 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 19.28854 ymin: 29.40611 xmax: 74.91574 ymax: 42.66035
Geodetic CRS:  WGS 84
# A tibble: 2 × 6
    FID COUNTRY     ISO   COUNTRYAFF  AFF_ISO                           geometry
  <int> <chr>       <chr> <chr>       <chr>                   <MULTIPOLYGON [°]>
1     1 Afghanistan AF    Afghanistan AF      (((61.27655 35.60725, 61.29638 35…
2     2 Albania     AL    Albania     AL      (((19.57083 41.68527, 19.58195 41…

Inspect your data: spatial features

Besides our general data inspection, we may also want to check the spatial features of our import. This check includes the geometric type (points? lines? polygons?) and the coordinate reference system.

# type of geometry
sf::st_geometry(hello_world) 
Geometry set for 251 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -180 ymin: -89 xmax: 180 ymax: 83.6236
Geodetic CRS:  WGS 84
First 5 geometries:

Inspect your data: spatial features

Each polygon is defined by several connected points to build an enclosed area. Several polygons in one data frame have the sf type multipolygons. Just as the world consists of several states, the polygon of the world consists of several smaller polygons.

# extract the simple features column and further inspecting 
attr(hello_world, "sf_column") |> 
  dplyr::pull(hello_world, var = _) |> 
  dplyr::glimpse()
sfc_MULTIPOLYGON of length 251; first list element: List of 1
 $ :List of 1
  ..$ : num [1:708, 1:2] 61.3 61.3 61.4 61.4 61.4 ...
 - attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"

Inspect your data: spatial features

Remember: The Coordinate Reference System is critical. A crucial step is to check the CRS of your geospatial data.

# coordinate reference system
sf::st_crs(hello_world) 
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

Spatial joins

Often, we want to combine geospatial datasets from different sources. The choice of an appropriate method depends, once again, on the data format. In the world of vector data, we usually speak of “spatial joins” when linking two vector datasets. Let’s pull in more spatially narrowed data.

german_districts <- 
  sf::st_read(
    "./data/VG250_KRS.shp", 
    as_tibble = TRUE
  )

plot(german_districts["AGS"])

Spatial joins

Using the synthetic survey geocoordinates, we can craft a toy example to demonstrate spatial joins. This is how they look.

points <- 
  readRDS(
    "./data/synthetic_survey_coordinates.rds"
  ) |> 
  sf::st_transform(sf::st_crs(german_districts))

plot(sf::st_geometry(points))

Spatial joins

It would be nice if we had regional identifiers in our points data, but there aren’t any. This issue shows where spatial joins come in handy. We can use sf::st_join() to extract this information.

points_ags <-  sf::st_join(points, german_districts["AGS"])

points_ags
Simple feature collection with 2000 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 294988.7 ymin: 5261146 xmax: 904079.4 ymax: 6080921
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 2,000 × 4
      id           geometry foreigner AGS  
 * <int>        <POINT [m]>     <int> <chr>
 1     1 (853435.9 5892079)         0 12073
 2     2 (520533.4 5879316)         1 03357
 3     3 (430178.6 5689132)         0 05958
 4     4 (521463.3 5884327)         1 03357
 5     5 (525072.6 5397485)         1 08116
 6     6   (765168 5904791)         0 13071
 7     7 (475594.9 5803719)         0 05770
 8     8 (670010.2 5473386)         0 09574
 9     9 (669450.7 5362344)         0 09174
10    10 (827147.8 5388243)         1 09262
# ℹ 1,990 more rows

There’s more

In this course, we only use a subset of the options for spatial joins in sf. Please refer to the official sf cheat sheet.

Finally: Transforming the CRS

We have to change our CRS fairly often depending on our data sources. That’s easily doable in sf using the sf::st_transform(). Here’s a closer look.

sf::st_crs(german_districts)$epsg
[1] 25832

Let’s pretend we want to use the EPSG code 4326–no problem!

german_districts_4326 <-  sf::st_transform(german_districts, 4326)

sf::st_crs(german_districts_4326)$epsg
[1] 4326

Exercise 2_1: Vector Data Refresher

Raster data

Difference to vector data

Data Structure:

  • Other data format(s), different file extensions
  • Geometries do not differ within one dataset

Implications:

  • Other geospatial operations possible

Benefits:

  • Can be way more efficient and straightforward to process
  • It’s like working with simple tabular data

Visual difference between vector and raster data

What exactly are raster data?

  • Hold information on (most of the time) evenly shaped grid cells
  • Basically, a simple data table
  • Each cell represents one observation

Metadata

  • Information about geometries is globally stored
  • They are the same for all observations
  • Their location in space is defined by their cell location in the data table
  • Without this information, raster data were simple image files

Important metadata

Raster Dimensions: number of columns, rows, and cells

Extent: similar to bounding box in vector data

Resolution: the size of each raster cell

Coordinate reference system: defines where on the earth’s surface the raster layer lies

Setting up a raster dataset is easy

input_data <- matrix(sample(1:100, 16), nrow = 4)

raster_layer <- terra::rast(input_data)

raster_layer
class       : SpatRaster 
dimensions  : 4, 4, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : 0, 4, 0, 4  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
name        : lyr.1 
min value   :     6 
max value   :    88 

We can already plot it

terra::plot(raster_layer)

File formats/extensions

  • GeoTIFF tif
  • Gridded data .grd
  • Network common data format .nc
  • Esri grid .asc
  • Sometimes, raster data come even in a text format, such as CSV

Implementations in R

terra is the most commonly used package for raster data in R.

Some other developments, e.g., in the stars package, also implement an interface to simple features in sf. We will work with stars later in this course.

The terra package helps to employ zonal statistics. But the spatstat package is even more elaborated.

Loading raster tiffs (German Census Data 2022)

Loading raster data in terra is straightforward. For this purpose, we use the function terra::rast().

pop_grid_ger_2022 <- terra::rast("./data/z22/population.tif")

pop_grid_ger_2022
class       : SpatRaster 
dimensions  : 868, 652, 1  (nrow, ncol, nlyr)
resolution  : 1000.467, 1000.467  (x, y)
extent      : 274144.4, 926449.1, 5236495, 6104901  (xmin, xmax, ymin, ymax)
coord. ref. : ETRS89 / UTM zone 32N (EPSG:25832) 
source      : population.tif 
name        :    cat_0 
min value   :     3.00 
max value   : 21326.12 

Loading raster tiffs (German Census Data 2022)

Let’s load another dataset to compare structures because that’s really easy in terra.

age_under_18_grid_2022 <- terra::rast("./data/z22/age_under_18.tif")

age_under_18_grid_2022
class       : SpatRaster 
dimensions  : 868, 652, 1  (nrow, ncol, nlyr)
resolution  : 1000.467, 1000.467  (x, y)
extent      : 274144.4, 926449.1, 5236495, 6104901  (xmin, xmax, ymin, ymax)
coord. ref. : ETRS89 / UTM zone 32N (EPSG:25832) 
source      : age_under_18.tif 
name        :  cat_0 
min value   :   0.62 
max value   : 100.00 

Compare layers by plotting

terra::plot(
  pop_grid_ger_2022
)

terra::plot(
  age_under_18_grid_2022
)

Simple statistics

Working with raster data is straightforward

  • quite speedy
  • yet not as comfortable as working with sf objects

For example, to calculate the mean, we can use the following:

terra::global(pop_grid_ger_2022, fun = "mean", na.rm = TRUE)
          mean
cat_0 393.3016

Get all values as a vector

We can also extract the values of a raster directly as a vector:

all_raster_values <- terra::values(pop_grid_ger_2022)

mean(all_raster_values, na.rm = TRUE)
[1] 393.3016

Nevertheless, although raster data are simple data tables, working with them is a bit different compared to, e.g., simple features.

Combining raster layers to calculate new values

young_people <- 
  pop_grid_ger_2022 * 
  (age_under_18_grid_2022 / 100)

young_people
class       : SpatRaster 
dimensions  : 868, 652, 1  (nrow, ncol, nlyr)
resolution  : 1000.467, 1000.467  (x, y)
extent      : 274144.4, 926449.1, 5236495, 6104901  (xmin, xmax, ymin, ymax)
coord. ref. : ETRS89 / UTM zone 32N (EPSG:25832) 
source(s)   : memory
varname     : population 
name        :       cat_0 
min value   :    1.223083 
max value   : 3797.164478 

Loading raster data with stars

Later, we will work with multidimensional raster data cubes. While it is possible to work them in terra, stars is way more elaborated (we’ll discuss it tomorrow in more detail). So, here is an example of how to load raster data into stars.

pop_grid_ger_2022_stars <- 
  stars::read_stars("./data/z22/population.tif")

pop_grid_ger_2022_stars
stars object with 2 dimensions and 1 attribute
attribute(s):
                Min.  1st Qu.   Median     Mean  3rd Qu.     Max.   NA's
population.tif     3 32.62876 99.43739 393.3016 331.5063 21326.12 355592
dimension(s):
  from  to  offset delta                refsys point x/y
x    1 652  274144  1000 ETRS89 / UTM zone 32N FALSE [x]
y    1 868 6104901 -1000 ETRS89 / UTM zone 32N FALSE [y]

stars objects can also be plotted

plot(pop_grid_ger_2022_stars)

Finally: Transforming the CRS

As mentioned before, we may also have to change the CRS of our raster data. Returning to terra, this is not called ‘transforming’ but ‘projecting’ (which makes sense, right?).

terra::crs(pop_grid_ger_2022, describe = TRUE)$code
[1] "25832"

Let’s pretend we want to use the EPSG code 3035–no problem!

pop_grid_ger_2022_3035 <- terra::project(pop_grid_ger_2022, "EPSG:3035")

terra::crs(pop_grid_ger_2022_3035, describe = TRUE)$code
[1] "3035"

Exercise 2_2: Basic Raster Operations